The significance of machine learning in neonatal screening for inherited metabolic diseases

Background Neonatal screening for inherited metabolic diseases (IMDs) has been revolutionized by tandem mass spectrometry (MS/MS). This study aimed to enhance neonatal screening for IMDs using machine learning (ML) techniques. Methods The study involved the analysis of a comprehensive dataset comprising 309,102 neonatal screening records collected in the Ningbo region, China. An advanced ML system model, encompassing nine distinct algorithms, was employed for the purpose of predicting the presence of 31 different IMDs. The model was compared with traditional cutoff schemes to assess its diagnostic efficacy. Additionally, 180 suspected positive cases underwent further evaluation. Results The ML system exhibited a significantly reduced positive rate, from 1.17% to 0.33%, compared to cutoff schemes in the initial screening, minimizing unnecessary recalls and associated stress. In suspected positive cases, the ML system identified 142 true positives with high sensitivity (93.42%) and improved specificity (78.57%) compared to the cutoff scheme. While false negatives emerged, particularly in heterozygous carriers, our study revealed the potential of the ML system to detect asymptomatic cases. Conclusion This research provides valuable insights into the potential of ML in pediatric medicine for IMD diagnosis through neonatal screening, emphasizing the need for accurate carrier detection and further research in this domain.


Introduction
Inherited metabolic diseases (IMDs) comprise a group of genetic disorders, including amino acid, organic acid, and fatty acid disorders (1).The application of tandem mass spectrometry (MS/MS) in neonatal screening has revolutionized the early identification of IMDs by analyzing and interpreting amino acids and acylcarnitines (2).At present, MS/MS technology enables the screening of approximately 50 metabolites, facilitating the detection of over 20 IMDs (3).However, the comprehensive metabolite measurement involved in MS/MS screening comes with limitations, particularly in terms of false-positive (FPs) and false-negative results (FNs) (4)(5)(6)(7).The consequences of such inaccuracies were significant, often subjecting families to unnecessary stress and healthcare costs or, conversely, delaying vital treatment.
With the advancement of science and technology, computational and machine learning (ML) methodologies provide a promising approach for analyzing high-dimensional data (8,9).Recent applications of ML techniques have extended to neonatal screening for the diagnosis of IMDs, improving screening sensitivity and specificity (10,11).ML techniques have the potential to expedite the diagnosis of IMDs.In a previous study conducted by our collaborative partners, 9 ML algorithms were employed to predict 16 IMDs (12).As the repository of screening and diagnostic data steadily accumulates, the ML system model has now acquired the capacity to prognosticate the presence of 31 IMDs (13).The present study aimed to conduct a comprehensive evaluation of the diagnostic efficacy of the ML system model using neonatal screening data from the Ningbo area (China) in clinical practice.

Patients' data
The study population consisted of 309,102 neonatal screening data collected from the Central Laboratory of Birth Defects Prevention and Control Affiliated with the Ningbo Women and Children's Hospital (Ningbo, China) between July 2014 and March 2020.In the large dataset of screening results, a total of 3,608 cases commenced recall procedures due to their initial positive screening results.Subsequently, 398 cases exhibited abnormal metabolic concentration or metabolite concentration ratios during secondary screening, indicating their potential as cases of IMDs.Among these, 180 cases underwent nextgeneration sequencing (NGS) to confirm IMDs, while the remaining 218 suspected cases did not proceed with NGS due to various reasons.These reasons include normal results from subsequent tandem mass spectrometry tests and urine organic acid tests, or parental refusal of NGS testing for their children.While these patients were included in the machine learning (ML) analysis, the lack of genetic testing reports for them precludes a definitive exclusion of disease presence.Therefore, our analysis focused solely on the diagnostic efficiency of cases with a clear genetic diagnosis, to ensure the integrity and reliability of our findings.The study flowchart is depicted in Figure 1.

MS/MS analysis
Quantification of amino acids and acylcarnitines in dried blood spots (DBS) was performed using the Xevo TQD tandem mass spectrometers (Waters Corp., Milford, MA, USA) in conjunction with the NeoBase Non-derivatized MSMS kit (PerkinElmer, Helsinki, Finland).The analysis comprehensively included 11 amino acids, encompassing a wide range of metabolites associated with the investigated amino acid disorders.In addition, 31 fatty acids (acylcarnitines) and succinylacetone were incorporated into the analysis, as they hold relevance to organic acidemias and fatty acid oxidation defects.Integral to our analysis, reference intervals for each metabolite were defined within the 0.5th to 99.5th percentile range, using a nonparametric ranking to address the non-normal distribution of our dataset.This careful selection ensures our screening's precision, with intervals refined as our sample pool expands, enhancing diagnosis accuracy by reducing false positives and negatives.The detailed information regarding the analyzed metabolites and their corresponding reference intervals can be found in Table S1.

Genetic testing and bioinformatics analysis
Genomic DNA was extracted from DBS or peripheral blood obtained from patients using the OMEGA Genomic DNA Extraction Kit (OMEGA Biotech, United States).Subsequently, targeted sequencing was conducted using the basic edition panel of IMDs (Genuine Diagnostic Laboratory, Hangzhou, China) to detect 94 genes, including SLC22A5, PAH, PTS, MUT, and other relevant genes.Target regions' sequences were enriched through multiple probe hybridizations using the Agilent SureSelect Human Exon Sequence Capture Kit.Following enrichment, capture products were purified using Agencourt AMPure XP beads (Beckman Coulter).After purification and quality testing, the sequencing libraries were quantified using the Illumina DNA standard and Primer Premix Kit (Kapa), and subsequently subjected to massively parallel sequencing using the Illumina MiSeq platform.All potentially pathogenic variants were validated through Sanger sequencing utilizing specific primers.Polymerase chain reaction (PCR) conditions followed TaKaRa LA PCRTM Kit Ver.2.1 (TaKaRa).The trans status of all compound heterozygous variants was determined.Identified variants underwent scrutiny against databases such as the Human Gene Mutation (HGMD) Database, ClinVar, ExAC consortium, gnomAD, 1,000 Genome Project database, the laboratory's internal database (∼20,000 mutations), and relevant literature.Novel missense variants were further assessed for potential pathogenicity using tools integrated into VarSome, including SIFT, PolyPhen-2, and MutationTaster.Variant classification followed the standards and guidelines set forth by the American College of Medical Genetics and Genomics (ACMG methods were applied in which the median of the original concentration divided by the biochemical indicators was applied to the detection indicators to eliminate the influence of regional and laboratory differences.We then trained the disease model by combining the MoM, gestational week, neonatal blood collection interval, neonatal weight, and corresponding IMDs.

ML system model
The ML framework utilized in this study has been previously established and published by our collaborative partners, as detailed in the prior research (12,13) Finally, we used our ML technique to design an easy-to-operate Web-based screening system for neonatal metabolic diseases.This system was specifically designed to assess the risk of specific IMDs in each screening dataset, where high-and low-risk cases were classified as positive and negative, respectively.

Neonatal screening and children with positive diagnosis
In our research, a primary screening of 309,102 neonates led to the identification of 3,608 infants with positive screening results, TABLE 1 The list of inherited metabolic disorders in machine learning system model.

ML system model vs. the reference interval in the initial screening
To assess the effectiveness of the ML system model, this study utilizes extensive neonatal screening data.Initially, we analyzed the positive rate between the ML system model and pediatricians using a predefined reference interval in the initial screening process.The findings indicate that, based on the reference interval, 3,608  newborns received a positive diagnosis, while 305,494 cases were categorized as negative during the initial screening phase, resulting in a positive rate of 1.17%.On the other hand, the ML system model identified 1,029 positive cases and 308,073 negative cases in the initial screening, representing a positive rate of 0.33%.The results showed that the positive rate of initial screening significantly decreased from 1.17% to 0.33% compared with the results achieved by the reference interval.This could minimize false positives in the initial screening that cause unnecessary family stress as well as potentially enable cost-effective screening.

Diagnostic efficiency of ML system model for suspected positive cases
To interpret the potential of ML in enhancing diagnostic efficiency, we employed data from 180 suspected positive cases with NGS results.The application of ML system model diagnosed 142 true-positives (TPs) and 22 true-negatives (TNs).Compared with the reference interval, 144 TPs and 2 TNs were identified.In addition, the application of ML system model detected 6 FPs and 10 FNs, while the utilization of reference interval detected 26 FPs and 8 FNs.
The application of ML system model significantly reduced the number of FPs from 26 to 6, and the detailed numbers of TNs, TPs, FNs, and FPs were presented in Table 2.The sensitivity of the ML system model and the reference interval was 93.42% and 94.74%, respectively.The specificity of the ML system model was 78.57%, while that of the reference interval was 7.14%.Furthermore, the ML system exhibited a positive predictive value (PPV) of 95.95% and a negative predictive value (NPV) of 68.75%, whereas the reference interval yielded a PPV of 84.71% and an NPV of 20%.The ML system model showed a significant increase in both PPV and NPV.The ML system exhibited a higher accuracy.The values of sensitivity, specificity, PPV, NPV, and accuracy were summarized in Table 3.

Comparative analysis of ML system model and reference interval reveals misdiagnosis in suspected inherited metabolic disease cases
Different disorders showed varying diagnostic results when using the ML system model (Table 3).BH4D, PCD, MMA, SCADD, IBDD, NICCD, CPT I, 2-MBDD, GA II, OTCD, and HARG accurately predicted positive cases.Nonetheless, a comprehensive assessment revealed 10 instances of misdiagnosis.There were 3 cases of PAHD, 3 cases of 3-MCCD, 3 cases of HMET, and 1 case of MCADD, and the FN rates were 13.04%, 7.32%, 33.33%, and 33.33%, respectively.In the reference interval, a total of 8 cases of misdiagnosis were found, including 3 cases of PAHD, 1 case of PCD, 1 case of MMA, 1 case of 3-MCCD, 1 case of NICCD, and 1 case of CIT I, and the FP rates were 13.04%, 3.33%, 16.67%, 2.44%, 33.33% and 12.50%, respectively (Table 3).Remarkably, two cases of PAHD were missed by both methods.Table 4 presents an overview of all of 16 misdiagnosed cases and their metabolic indices.Among them, 3 cases were identified with homozygous or compound heterozygous mutations, and the remaining 13 cases were heterozygous carriers.This result suggests that heterozygous carriers are easily misdiagnosed regardless of whether the ML method or the reference interval scheme is used for analysis.It is important to note that misdiagnosis occurs mainly among heterozygous carriers.In some heterozygous carriers, metabolic alterations were observed despite a low-risk designation by the ML system.In contrast, in other heterozygous carriers, indicators of normal metabolism were classified as high risk by the ML system.

Diagnostic acccuracy and alterations in metabolic indicators in patients with homozygous and compound heterozygous mutations vs. heterozygous carriers
The results of the present study revealed a remarkable diagnostic accuracy of 97.22% in a cohort of 72 patients with homozygous and compound heterozygous mutations, as assessed by the ML system.Similarly, the ML system demonstrated a diagnostic accuracy of 88.75% for 80 heterozygous carriers.Importantly, the metabolic indicators revealed significant differences between patients harboring homozygous mutations and those harboring compound heterozygous mutations compared with heterozygous carriers.Patients with IMDs harboring homozygous mutations and compound heterozygous mutations exhibited more significant alterations in metabolic indicators.
Table 5 provides a comprehensive overview of the diagnostic outcomes and alterations in metabolic indicators identified in patients with IMDs harboring homozygous and compound heterozygous mutations, as well as heterozygous carriers.This table presents detailed insights into the diagnostic efficacy and metabolic changes within these populations.

Discussion
In our study, a total of 309,102 newborns underwent screening, leading to the identification of 152 cases with IMDs.The incidence  18) pioneered the development of a machine learning logistic regression analysis model aimed at ameliorating the diagnostic accuracy associated with Phenylketonuria.Furthermore, Peng et al. (19) conducted an insightful analysis of the Random Forest machine's performance concerning GA-I, MMA, OTCD, and VLCADD.In contrast, our investigation adopts a holistic approach, harnessing the potential of nine distinct algorithms that collectively empower our ML system model to predict and scrutinize 31 distinct IMDs across a broad spectrum.This comprehensive strategy not only extends the scope of applicability but also augments the clinical value of our study, promising multifaceted insights into IMDs diagnosis and management.
It is imperative to acknowledge that ML techniques commonly exhibit enhanced predictive capabilities when applied to substantial  datasets (20).In our study, the ML system showcased a notable diminished the initial positive screening rate in contrast to the traditional reference interval.These observations not only signify a reduction in the count of suspected cases but also hold promise for curtailing the necessity of patient recall.
In our investigation, ML model was successfully deployed, accumulating data from the Ningbo region.The experimental outcomes furnish compelling evidence substantiating the effectiveness of the ML system in the diagnosis of IMDs via neonatal screening.In direct comparison with the traditional reference interval, our ML system exhibited a sensitivity level akin to that of 93.42%, thereby maintaining diagnostic acumen.However, it notably improved specificity, elevating it from a mere 7.14% to a substantial 78.57%.This enhancement translated into a noteworthy reduction in the number of FPs, diminishing their count from 26 to a mere 6.This achievement is in alignment with prior research conducted by Peng et al. (19).
Furthermore, our ML system demonstrated a markedly heightened PPV of 95.95%, outperforming the traditional reference interval which yielded a PPV of 84.71%.These findings are congruent with the work of Zhu et al. (18), who reported a notable increase in the PPV for PAHD through the application of an ML model, surging from 19.14% to an impressive 32.16%.The substantial reductions in FPs and the concurrent augmentation of PPV underscore the significant improvements in screening efficiency realized through the application of ML methodologies.
However, in the relentless pursuit of augmented diagnostic precision and efficiency, it becomes imperative to address and comprehend the emergence of FNs, a pivotal concern inherent to any diagnostic framework.In the work of Tang et al. ( 21), four instances of NICCD were erroneously overlooked, signifying the susceptibility to FN outcomes.Similarly, Lin et al. (22) reported the misdiagnosis of a MADD patient whose acylcarnitine levels resided within the normal reference range upon recall.Within our findings, it's noteworthy that our ML model exhibited the omission of 10 cases, while the traditional reference interval missed 8 cases.The ML system model demonstrated an elevated false negative rate for 3-MCCD, HMET, and MCADD in comparison to the reference interval.
To elucidate the underpinnings of these FNs within the ML system model, we conducted a comprehensive analysis encompassing genetic results and metabolite concentrations for the missed cases.Within the cohort of cases missed by the ML model, which includes 3 cases of HMET, 3 cases of 3-MCCD, 3 cases of PAHD, and 1 case of MCADD, all cases, except for one PAHD case harboring a compound heterozygous mutation in PAH, were identified as heterozygous carriers of these conditions.This observation suggests a proclivity for heterozygous carriers to be susceptible to misclassification by the ML system model, emphasizing the intricacies of carrier identification within diagnostic frameworks.This study stands as an innovative endeavor in the realm of medical science, pioneering the application of ML techniques for the precise diagnosis of patients afflicted with IMDs harboring pathogenic mutations.Among the cohort of 80 patients classified as heterozygous carriers, the ML system displayed a remarkable capacity by accurately identifying 71 cases, thus underscoring its consistent performance across a spectrum of diverse metabolic indices.It is imperative to note that heterozygous carriers, particularly those bearing partially functional alleles, exhibited discernible variations in metabolic profiles when compared to cases characterized by classical mutations (23)(24)(25).The presence of a single mutated allele induced noteworthy alterations in associated proteins and enzymes, thereby engendering variations in pertinent metabolic markers.A compelling observation emerged, revealing that the concentration of metabolites in heterozygous carriers registered a significant reduction when juxtaposed with patients harboring homozygous or compound heterozygous mutations.This revelation has profound implications for heterozygous carriers, particularly within the domain of neonatal screening practices as administered by pediatricians.Our findings provide cogent evidence to posit that heterozygous carriers may manifest variations in metabolic indicators, thereby precipitating potential misclassification.To surmount this challenge, it becomes imperative to embark on further research endeavors aimed at the refinement and augmentation of ML algorithms, with a specific emphasis on enhancing the capacity for accurate carrier detection.
The differences in metabolite profiles observed in heterozygous carriers, especially in PAHD patients, may affect gene expression due to variations in noncoding regions or deep introns, thereby forming compound heterozygous mutations affecting gene function, which also highlights the limitations of targeted region capture sequencing in the detection of noncoding regions and deep intron regions.These undetected mutations significantly affected metabolic profiles, suggesting that genetic interactions are more complex than previously understood and that the availability of NGS could make whole-exome and whole-genome sequencing more affordable and compensate for the lack of targeted region capture.In order to cope with these limitations, ML has become an indispensable supplement.Its ability to analyze large data sets, including metabolic profiles and clinical measures, improves predictive accuracy and reduces diagnostic errors.Newborn screening could be more effective by combining the pattern recognition and prediction capabilities of machine learning with the genetic insights provided by NGS.
To address the risk of missed diagnoses in newborns with initially normal screening results, as well as those born prematurely or with low birth weight, our institution has implemented a seven-year follow-up program.This method is aimed at maximizing the detection of late-onset symptoms of metabolic disorders that may not be evident in initial screenings.Specifically, this long-term monitoring ensures that children who may show symptoms of IMDs are promptly identified and assessed, significantly reducing the risk of missed diagnoses.By adopting this comprehensive follow-up, our study addresses the concerns regarding the limitations of DBS acquisition.The follow-up program was guided by the 'Expert Consensus on Chinese Pediatric Health Examination' to monitor and ensure the health of newborns (26).Two key assessments were initiated within the first month after discharge, and during the first year, children underwent quarterly examinations that included a history review, physical examination, physical measurements (e.g., height, weight, body-mass index, and head circumference), laboratory and imaging studies, and cognitive, neuromotor, developmental, language, hearing, vision, and dental health.Subsequently, the follow-up period was a biannual assessment, which was gradually changed to annual.Children with an increased risk of metabolic diseases due to genetic factors should be examined and evaluated intensively.
An intriguing case in our study involved a patient with MMA who harbored a homozygous mutation of the MUT gene c.1663G>A, despite no abnormal changes in metabolic indicators (C3 = 3.96 μmol/L).This highlights that certain IMDs may not manifest obvious changes in MS/MS neonatal screening during the early stages, resulting in FN results.As more cases of this nature are unveiled, the capability of our ML system to identify asymptomatic cases will be increasingly evident, emphasizing its potential toimprove the detection of such cases.
Furthermore, ML algorithms typically benefit from large datasets, as larger datasets theoretically result in improved predictive performance.As our model continues to be implemented in Ningbo, we anticipate that the utilization and advancement of ML algorithms will gain increasing popularity in the near future.This trend is expected to further enhance the prediction accuracy and computational performance of risk assessment models for neonatal IMDs.

Conclusions
In conclusion, the application of our ML system exhibited promising effectiveness in pediatric diagnostic screening of IMDs.The model achieved a sensitivity of 93.42% and a specificity of 78.57%, surpassing the performance of the reference interval.Furthermore, the ML system demonstrated increased PPV and NPV.Notably, the ML system proved to be valuable in identifying carrier patients, providing novel insights into the application of ML in pediatric medical practice for diagnosing IMDs through neonatal screening.
The datasets generated and/or analysed during the current study are not publicly available due to regulation of Women and Children's Hospital of Ningbo University and protecting patient personal information.Requests to access these datasets should be directed to the corresponding author HL, Email: lihaibo-775@163.com.

blood gas analyses, blood routines, liver function tests, vitamin B12 tests, imaging examinations, genetic test information. e. Standardized median multiple (multiple of the medium, MoM)
). blood samples, types of blood collection needles, succinylacetone treatment methods, normal population ranges, and interpretation rules.b.Quality Control Data: This section comprised information such as quality control numbers, types of quality control, quality control batch numbers, amino acid internal standard batch numbers, acylcarnitine internal standard batch numbers, factory times, experimental times, and test values for each quality control analyte.review conclusions, sample numbers, screening times, blood collection dates, delivery dates, experimental dates, experimental methods (derivative or non-derivative), quality control numbers, and detection concentrations for each analyte.d.Positive Case Data: This category covered screening numbers, confirmed diseases, urine organic acid tests, blood ammonia tests,

TABLE 2
Comparison of diagnostic performance metrics between the ML system model and reference interval in suspected cases.

TABLE 4
The misdiagnosis results and genetic analysis for IMDs.

TABLE 3
Diagnosis and false negative rates of different IMDs: comparison between ML system and reference interval for neonatal screening.

TABLE 5
Ml system true positive rates, and metabolic indicators in homozygous/compound heterozygous and heterozygous carriers for inherited metabolic disorders.